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Maximum entropy models are increasingly being used to describe the collective activity of neural 
populations with measured mean neural activities and pairwise correlations, but the full space of 
probability distributions consistent with these constraints has not been explored. We provide lower 
and upper bounds on the entropy for both the minimum and maximum entropy distributions over 
binary units with fixed mean and pairwise correlation, and we construct distributions for several 
relevant cases. Surprisingly, the minimum entropy solution has entropy scaling logarithmically with 
system size, unlike the linear behavior of the maximum entropy solution, resolving an open question 
in neuroscience. Our results show how only small amounts of randomness are needed to mimic 
low-order statistical properties of highly entropic distributions, and we discuss some applications for 
engineered and biological information transmission systems. 



Maximum entropy models are central to the study of 
physical systems in thermal equilibrium [T], and they 
have recently been found to model protein folding [21 [3] , 
antibody diversity (4j, neural population activity [5^11|. 
and even flock behavior [T^] quite well (c/., [13] )■ This 
is perhaps surprising since the usual physical arguments 
involving ergodicity or equality among energetically ac- 
cessible states are not obviously applicable for such sys- 
tems, though such models have been justified in terms 
of imposing no structure beyond what is explicitly mea- 
sured [3 [13] . Conversely, it is not clear to what extent 
this good agreement was inevitable. If the space of dis- 
tributions were sufficiently constrained by observations, 
then agreement is an unavoidable consequence of the con- 
straints rather than a consequence of the unique suit- 
ability of the maximum entropy model. In neuroscience, 
there is also controversy [5] [HI [T5HT7) over the notion 
that small pairwise correlations can conspire to constrain 
the behavior of large neural ensembles, and it has been 
shown [HI 115) that pairwise models do not always allow 
accurate extrapolation from small populations to large 
ensembles. 

Previous authors have studied these issues with max- 
imum entropy models expanded to second- [S], third- 
[T5] , and fourth-order [T7j . Here we use non-perturbative 
methods to derive rigorous upper and lower bounds on 
the entropy of the minimum entropy distribution for 
fixed means and pairwise correlations, and we construct 
explicit low and high entropy models for the full range 
of possible uniform first- and second-order constraints 
(Eqs. ([3])-(|6]); Figs. [l][2]). Interestingly, we find that en- 
tropy differences between models with the same first- and 
second-order statistics can be nearly as large as is possible 
between any two arbitrary distributions. Thus, entropy 
is only weakly constrained by these statistics, and the 



success of maximum entropy models in biology [^12] . 
when it occurs for large enough systems [15j . represents 
a real triumph of the maximum entropy approach. 

Our results demonstrate that empirically measured 
first-, second-, and third-order statistics are essentially 
inconsequential for testing coding optimality in a broad 
class of engineered information transmission systems, 
whereas the existence of other statistical properties, such 
as finite exchangeability |18| , do guarantee information 
transmission near channel capacity |19l 120] , the maxi- 
mum possible information rate given the properties of 
the information channel. A better understanding of 
minimum entropy distributions subject to constraints is 
also important for minimal state space realization |21] 
- a form of optimal model selection based on an inter- 
pretation of Occam's Razor complementary to that of 
Jaynes [TJ]. Our results also have implications for com- 
puter science as algorithms for generating binary random 
variables with low entropy have found many applications 

(e.g., mm)- 

Consider an abstract description of a neural ensemble 
consisting of N spiking neurons. In any given time bin, 
each neuron i has binary state Si denoting whether it 
is currently firing an action potential (s; = 1) or not 
{si = 0). The state of the full network is represented by 
s = (si, . . . ,sn) S {0, 1}''^. Let p{s) be the probability 
of state s so that the distribution over all 2^ states of 
the system is represented by p S [0, 1]^ , J2sP(^ — ^■ 

In neural studies using maximum entropy models, elec- 
trophysiologists typically measure the time-averaged fir- 
ing rates fii — (si) and pairwise event rates i^ij — (siSj) 
and fit the maximum entropy model consistent with 
these constraints, yielding a Boltzmann distribution for 
an Ising spin glass [37]. This "inverse" problem of in- 
ferring the interaction and magnetic field terms in an 
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FIG. 1. The minimum entropy grows no faster than log- 
arithmically with the system size A'^ for any mean activity 
level jj. and pairwise correlation strength i^. (a) In a pa- 
rameter regime relevant for neural population activity in the 
retina O |6] (/i = 0.1, f = 0.011), we can construct an ex- 
plicit low entropy solution (52°"^) that grows logarithmically 
with A'", unlike the linear behavior of the maximum entropy 
solution (52). (b) Even for mean activities and pairwise cor- 
relations matched to the global maximum entropy solution 
(S2; M — V^i ^ ~ V*): w-e can construct explicit low entropy 
solutions and Sa""^) and a lower bound on the 

entropy that each grow logarithmically with N, in contrast to 
the linear behavior of the maximum entropy solution (S2) and 
the finitely exchangeable minimum entropy solution (S'l^'^'^). 
Si is the minimum entropy distribution that is consistent with 
the mean firing rates. It remains constant as a function of N. 

Ising spin glass Hamiltonian that produce the measured 
means and correlations is nontrivial, but there has been 
progress |17l I38H42) . The maximum entropy distribu- 
tion is not the only one consistent with these observed 
statistics, however. In fact, there are many others, and 
we will refer to the complete set of these as the "solu- 
tion space" for a given set of constraints. Little is known 
about the minimum entropy permitted for a particular 
solution space. 

Our question is, given a set of observed mean firing 
rates and pairwise correlations between neurons, what 
are the possible entropies for the system? We will de- 
note the maximum (minimum) entropy compatible with 
a given set of imposed correlations up to order n by Sn 
{Sn). The maximum entropy framework [5] provides a 
hierarchical representation of neural activity: as increas- 
ingly higher order correlations are measured, the corre- 
sponding model entropy Sn is reduced until, at least in 
principle, it reaches a lower limit. Here we introduce a 
complementary, minimum entropy framework: as higher 
order correlations are specified, the corresponding model 
entropy Sn is increased until all correlations are known. 
The range of possible entropies for any given set of con- 
straints is the gap {Sn — Sn) between these two model 
entropies, and our primary concern is whether this gap 
is greatly reduced for any observed first- or second-order 
statistics for any system size N. We find that the gap 
grows linearly with N, up to a logarithmic correction. 



We will restrict ourselves here to symmetric con- 
straints; that is, values of mean firing rates and pairwise 
correlations are uniform: 

— fi, for alH 1, . . . , (1) 
Vij — v, for all i 7^ j. (2) 

Given symmetric constraints, we find the following 
bounds on the maximum and minimum entropies for 
fixed values of fi and v. For the maximum entropy: 

{l-x)N-Ci{ti,i^)<S2<N, (3) 

where x = ^^^^^^^ and Ci is a constant that only depends 
on ^ and ly. For the minimum entropy: 

'^^^ (ttw^th^) - - - 

(4) 

where a(/x, z^) — (4(z^ — fi) + 1)^. In most cases, the 
lower bound in Eq. Q asymptotes to a constant for large 
N, but in the special case where fi and z/ have values 
consistent with independent neurons (/i = 1/2 and ~ 
1/4), we can give the tighter bound: 

\og^{N)<S2<\og^{N) + 2. (5) 

An important class of probability distributions are the 
exchangeable distributions |18| . which have the property 
that the probability of a sequence of ones and zeros is only 
a function of the number of ones in the binary string. We 
have constructed a family of exchangeable distributions 
that we conjecture is a minimum entropy exchangeable 
solution with entropy S'f^'^'' that scales linearly with N: 

C72(/., iy)N ~ 0(log2 N) < 5^"" < C^ifi, v)N, (6) 

where C2 and C3 are constants that only depend on /i 
and V. We have empirically confirmed that this is indeed 
a minimum entropy exchangeable solution for N < 200. 

We obtained these bounds by constructing families of 
low entropy distributions and exploiting the geometry 
of the entropy function. Entropy is a strictly concave 
function of the probabilities and therefore has a unique 
maximum that can be identified using standard methods 
[43 :. at least for sufiiciently small or symmetric systems. 
Indeed, it is easy to show (see Appendix |b]) that the 
maximum entropy ^2 for any system with specified mean 
and pairwise correlation scales linearly with N (Eq. (|3|, 
Fig.[l]). 

By contrast, the minimum entropy distribution exists 
at a vertex of the allowed space, where most states have 
probability zero ([M]; see Appendix [C|). Our challenge 
then is to determine in which vertex (or vertices) a min- 
imum resides. The entropy function is nonlinear, pre- 
cluding approaches from linear programming, and the 
dimensionality of the probability space grows exponen- 
tially with N, making exhaustive search and gradient 
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descent techniques intractable for TV > 5. Fortunately, 
we can compute a lower bound 5*2° on the entropy of the 
minimum entropy solution for all N (Fig. [I]) , and we have 
constructed two families of ex plici t solutions with low en- 
tropies (5*2°" and S*™"^; Figs. 1 2) for a broad parameter 



H = 0.5 



regime covering all allowed values for /i and v. 

Using the concavity of the entropy function together 
with Jensen's inequality, one can derive an upper bound 
on the entropy [20], but similar methods also allow us 
to obtain a lower bound Eq. Q on the entropy 

(see Appendix [h| : 



S2{N,ti,,y)>Si° = log 



N 



1 + {N -l)a{fj,,iy) 



(7) 



where a{^,v) — (4(i^ — ^) + 1)^, and ^2 (iV, yit, z/) is the 
minimum entropy given a network of size N with con- 
straints n and V. This lower bound asymptotes to the 
constant value log2(l/a(/i, J^)) as N becomes large ex- 
cept for the special case: 



ly^H - 1/4, 



(8) 



where a vanishes. In the large N limit, we have the 
inequality ^ > v > (see Appendix |a| , so the only 
values of /i and v satisfying Eq. ([s]) are 

^l = 1/2, v = ^?^ 1/4. (9) 

In this particular case, the lower bound Eq. ([7| scales 
logarithmically with TV, rather than as a constant, but 
for large systems this difference is insignificant compared 
with the linear dependence S'o = A'^ of the maximum 
entropy solution (i.e., N fair i.i.d. Bernoulli variables). 

In addition to this lower bound, we can also construct 
probability distributions that provide upper bounds on 
the entropy of a minimum entropy solution. Each of these 
solutions has an entropy that grows logarithmically with 
N (see Appendices|E) ??, Eqs. (|4])-([5])): 



<log2(Wp(Wp-l)) 

< log2 (A(2A - 1)) , 
Sr = \\og,iN) + l] 
<log2(^)-)-2. 



- 1 



(10) 
(11) 



where [.] is the ceiling function and \.~\p represents the 
smallest prime at least as large as its argument. Thus, 
there is always a solution whose entropy grows no faster 
than logarithmically with the size of the system, for any 
observed levels of mean activity and pairwise correlation. 

As illustrated in Fig. [T^, for large binary systems with 
first- and second-order statistics matched to those of 
many neural populations, which have low firing rates 
and correlations slightly above chance ([SHU]; M = 0.1, 
v = 0.011), the range of possible entropies grows almost 
linearly with TV, despite the highly symmetric constraints 
imposed (Eqs. ^ and 
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FIG. 2. Minimum and maximum entropy models for sym- 
metric constraints, (a) Entropy as a function of the strength 
of pairwise correlations for the maximum entropy model (^2), 
finitely exchangeable minimum entropy model {82^'^'^), and a 
constructed low entropy solution {82°"), all corresponding to 
fj, = 1/2 and N = 5. Filled circles indicate the global mini- 
mum Si and maximum Si for fi = 1/2. (b)-(d) Support for 
S2 (b), ST"'" (c), and §2°" (d) corresponding to the three 
curves in panel (a). States are grouped by the number of ac- 
tive units; darker regions indicate higher total probability for 
each group of states, (e)-(h) Same as for panels (a) through 
(d), but with A'^ = 30. Note that, with rising A*', the cusps in 
the Si^'^'^ curve become much less pronounced. 



Consider the special case of first- and second-order 
constraints (Eq. [o]) that correspond to the unconstrained 
global maximum entropy distribution. For these highly 
symmetric constraints, both our upper and lower bounds 
on the minimum entropy grow logarithmically with N, 
rather than just the upper bound as we found for the 
neural regime (Fig. [iji). In fact, we have constructed 
an exphcit solution (Eq. (|Tl] ); Figs. [I]3|2^,d,e,h; Ap- 
pendix [f]) , whose entropy 5*2°" is never more than two 
bits above our lower bound (Eq. ([t])) for all N. Clearly 
then, these constraints alone do not guarantee a level of 
independence of the neural activities commensurate with 
the maximum entropy distribution. By varying the rela- 
tive probabilities of states in this explicit construction we 
can make it satisfy a much wider range of ^ and v values 
covering most of the allowed region (see Appendix (Cj) 
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while still remaining a distribution whose entropy grows 
only logarithmically with TV. 

The large gap between 82^'^'^ and ^2 demonstrates 
that a distribution can dramatically reduce its entropy 
if it is allowed to violate the symmetries present in the 
constraints. This is reminiscent of other examples of 
symmetry-breaking in physics for which a system finds 
an equilibrium that breaks symmetries present in the 
physical laws. However, here the situation is in a sense 
reversed: Observed statistics obeying a symmetry (the 
observations about the system) are produced by an un- 
derlying model that does not. 

We now examine consequences for engineered commu- 
nication systems. Specifically, consider a device such as a 
digital camera that exploits compressed sensing |45l |46] 
to reduce the dimensionality of its image representations. 
A compressed sensing scheme involves taking inner prod- 
ucts between the vector of raw pixel values and a set of 
random vectors, followed by a digitizing step to output 
A^-bit strings. Theorems exist for expected information 
rates of compressed sensing systems, but we are unaware 
of any that do not depend on some knowledge about 
the input signal, such as its sparse structure [13 ITT] . 
Without such knowledge, it would be desirable to know 
which empirically measured output statistics could tell 
us whether such a camera is utilizing as much of the N 
bits of channel capacity as possible for each photograph. 

As we have shown, even if the mean of each bit is 
fi = 1/2, and the second- and third-order correlations are 
at chance level (j^ — 1/4; (siSjSk) = ^/s, for distinct fc), 
consistent with the maximum entropy distribution, it is 
possible that the Shannon mutual information shared by 
the original pixel values and the compressed signal is only 
on the order of log2(A^) bits, well below the channel ca- 
pacity (A^ bits) of this (noiseless) output stream. We 
emphasize that, in such a system, the transmitted infor- 
mation is limited not by corruption due to noise, which 
can be neglected for many applications involving digi- 
tal electronic devices, but instead by the nature of the 
second- and higher-order correlations in the output. 

Thus, measuring pairwise or even triplet- wise corre- 
lations between all bit pairs and triplets is insufficient 
to provide a useful floor on the information rate, no 
matter what values are empirically observed. However, 
measuring the extent to which other statistical proper- 
ties are obeyed can yield strong guarantees of system 
performance. In particular, exchangeability is one such 
constraint. Fig. [l] illustrates the near linear behavior of 
the lower bound on information {82^'^^) for distributions 
obeying exchangeability, in both the neural regime (cyan 
curve, panel (a)) and the regime relevant for our engineer- 
ing example (cyan curve, panel (b)). We experimentally 
find that any exchangeable distribution has as much en- 
tropy as the maximum entropy solution, up to terms of 
order log2(A) (see Appendix [P]) . 

In computer science, it is sometimes possible to con- 



struct efficient deterministic algorithms from randomized 
ones by utilizing low entropy distributions. One common 
technique is to replace the independent binary random 
variables used in a randomized algorithm with those sat- 
isfying only pairwise independence |48| . In many cases, 
such a randomized algorithm can be shown to succeed 
even if the original independent random bits are replaced 
by pairwise independent ones having significantly less en- 
tropy. In particular, efficient derandomization can be 
accomplished in these instances by finding pairwise inde- 
pendent distributions with small sample spaces. Several 
such designs are known and use tools from finite fields 
and linear codes [27l EH I49H5T] , combinatorial block de- 
signs [26l [52] , Hadamard matrix theory [36l [53] , and lin- 
ear programming [35] . among others. Our construction 
here of a pairwise independent distribution with entropy 
S'2°" adds to this literature and is completely elementary. 

Maximum entropy models are powerful tools for un- 
derstanding physical systems and they are proving to be 
useful for describing biology as well, but a deeper under- 
standing of the full solution space is needed as we explore 
systems less amenable to arguments involving ergodicity 
or equally accessible states. In some settings, minimum 
entropy models can also provide a floor on information 
transmission, complementary to channel capacity, which 
provides a ceiling on system performance. 
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APPENDIX A: ALLOWED RANGE OF u GIVEN 
At ACROSS ALL DISTRIBUTIONS FOR LARGE N 

We begin by determining the upper bound on v, the 
probability of any pair of neurons being simultaneously 
active, given ^, the probability of any one neuron be- 
ing active, in the large N regime, where N is the total 
number of neurons. Time is discretized and we assume 
any neuron can spike no more than once in a time bin. 
We have i' < because v is the probability of a pair of 
neurons firing together and thus each neuron in that pair 
must have at least a firing probability of v. Furthermore, 
it is easy to see that the case ^ = is feasible when there 
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are only two states with non-zero probabilities: all neu- 
rons silent (po) or all neurons active (pi). In this case, 
pi — ji — V . We use the term "active" to refer to neurons 
that are spiking, and thus equal to one, in a given time 
bin, and we also refer to "active" states in a distribution, 
which are those with non-zero probabilities. 

We now proceed to show that the lower bound on v for 
large N is fi^, the value of consistent with statistical 
independence among all N neurons. We can find the 
lower bound by viewing this as a linear programming 
problem |431 154j . where the goal is to maximize —v given 
the normalization constraint and the constraints on fj,. 

It will be useful to introduce the notion of an exchange- 
able distribution [18], for which any permutation of the 
neurons in the binary words labeling the states leaves 
the probability of each state unaffected. For example if 
A'' = 3, an exchangeable solution satisfies 



The firing rate constraint is similar, only now we must 
consider summing only those probabilities that have a 
particular neuron active. How many states are there with 
only a pair of active neurons given that a particular neu- 
ron must be active in all of the states? We have the 
freedom to place the remaining active neuron in any of 
the — 1 remaining sites, which gives us {^i^) states 
with probability p{2). In general if we consider states 
with i active neurons, we will have the freedom to place 
i — 1 of them in A'^ — 1 sites, yielding: 



N 



(A.5) 



Finally, for the pairwise firing rate, we must add up 
states containing a specific pair of active neurons, but 
the remaining i — 2 active neurons can be anywhere else: 



p(lOO) = p(OlO) =p(001), 
p(llO) = p(lOl) =p(011). 



(A.l) 
(A.2) 



N 



(A.6) 



In other words, the probability of any given word de- 
pends only on the number of ones it contains, not their 
particular locations, for an exchangeable distribution. 

In order to find the allowed values of fi and we need 
only consider exchangeable distributions. If there exists 
a probability distribution that satisfies our constraints, 
we can always construct an exchangeable one that also 
does given that the constraints themselves are symmetric 
(Eqs. (1) and (2)). Let us do this exphcitly: Suppose we 
have a probability distribution p{s) over binary words 
s = (si, . . . , Sat) G {0, 1}^ that satisfies our constraints 
but is not exchangeable We construct an exchangeable 
distribution Pe{w) with the same constraints as follows: 



Pe{s) = ^ 



(A.3) 



where a is an element of the permutation group Vn on A^ 
elements. This distribution is exchangeable by construc- 
tion, and it is easy to verify that it satisfies the same 
symmetric constraints as does the original distribution, 
pis). 

Therefore, if we wish to find the maximum —v for a 
given value of /i, it is sufficient to consider exchangeable 
distributions. From now on in this section we will drop 
the e subscript on our earlier notation, define p to be 
exchangeable, and let p{i) be the probability of a state 
with i spikes. 

The normalization constraint is 



N 

E 

i=0 



A^' 



p{i) 



(A.4) 



Here the binomial coefficient 
states with i active neurons. 



(N\ 



counts the number of 



Now our task can be formalized as finding the maxi- 
mum value of 



N 



s:("-fiKo 



i - 2 



(A.7) 



subject to 



N 



i=0 
N 



4 = 1 ^ ^ 

> 0, for all i. 



(A.8) 

(A.9) 
(A.IO) 



This gives us the following dual problem: Minimize 



(A.ll) 



given the following A^ -f 1 constraints (each labeled by i) 



Ao 



^-lh^-[^-2 



N>i>0, 



(A.12) 

where (^) is taken to be zero for b < 0. The principle of 
strong duality [43] ensures that the value of the objective 
function at the solution is equal to the extremal value of 
the original objective function —v. 

The set of constraints defines a convex region in the 
Ai, Aq plane as seen in figure (A.l). The minimum of 



our dual objective generically occurs at a vertex of the 
boundary of the allowed region. It can be shown that this 



occurs where Eq. (A.12) is an equality for two adjacent 
values of i. Calling the first of these two values iq, we 
then have the following two equations that allow us to 
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Substituting Eq. (|A.22|) into Eq. (|A.20|, we find 



2.0 




FIG. A.l. An example of the allowed values of Ao and Ai for 
the dual problem (iV = 5). 



determine the optimal values of Aq and Ai (Aq and A^, 
respectively) as a function of ig 



Solving for A^ and X^, we find 

, * _ iajia + 1) 
° " N{N^1) 



N-2 
10-2 
N-2 

IQ - 1^ 



-2io 



(N-l)- 



(A.13) 
(A.14) 

(A.15) 
(A.16) 



Plugging this into Eq. (A.ll) we find the optimal value 
£* is 



£* =X* 



N{N-1) 
ioiig + 1 - 2fiN) 
N{N-1) 



(A.17) 
(A.18) 

(A.19) 



Now all that is left it to express iq as a function of fi 
and take the limit as becomes large. This expression 



can be found by noting from Eq. ( A.ll ) and Fig. A.l that 
at the solution, zg satisfies 

- m{io) < fi < -m{io + l), (A.20) 

where m(i) is the slope, dXo/dXi, of constraint i. The 



expression for m(i) is determined from Eq. (A. 12) 



(17) 
i 

N' 



(A.21) 
(A.22) 



N - ^ - N 



1 



This allows us to write 



+ b{N) 
N 



(A.23) 



(A.24) 



where b{N) is between and 1 for all N. Solving this for 
io, we obtain 



(A.25) 



Substituting Eq. (|A.25|) into Eq. (|A.19|), we find 
£ 



(iV/i - b{N)){Nn - b{N) + 1 - 2Nn) 



N{N - 1) 
(iV/i - b{N)){~N^ - b{N) + 1) 



N{N - 1) 
1 



(A.26) 
(A.27) 
(A.28) 



Taking the large N limit we find that £* — and by 
the principle of strong duality [43' the maximum value 
of —V is —fJ.^. Therefore we have shown that for large N, 
the region of satisfiable constraints is simply 

as illustrated in Fig. A. 2 



(A.29) 



APPENDIX B: THE MAXIMUM ENTROPY 
SOLUTION 

We begin by stating the general form for the solution 
for known mean firing rate and pairwise constraints and 
impose the symmetry that all statistics are equal across 
neurons and pairs of neurons. We will then demonstrate 
that for arbitrary fixed values for fj, and v, the maximum 
entropy must scale linearly with N as N oo. 

In general, the constraints can be written 

^^^{■s^)=Y.Pi^■'^^ Z = 1,...,A^, (B.l) 

s 

V = {SiSj) = '^p{s)s^SJ, i ^ j, (B.2) 

where the sums run over all 2^ states of the system. In 
order to enforce the constraints, we can add terms in- 
volving Lagrange multipliers A^ and jij to the entropy in 
the usual fashion to arrive at a function to be maximized 



i<j \ s J 



(B.3) 
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If there are k neurons active, this becomes 




FIG. A. 2. The red shaded region is the set of values for ^ and 
1/ that can be satisfied for at least one probability distribu- 
tion in the N ^ oo limit. The purple line along the diagonal 
where = /i is the distribution for which only the all active 
and all inactive states have non-zero probability. It represents 
the global entropy minimum for a given value of fi. The red 
parabola, v — , aX the bottom border of the allowed re- 
gion corresponds to a wide range of probability distributions, 
including the global maximum entropy solution for given /i 
in which each neuron fires independently. We find that low 
entropy solutions reside at this low v boundary as well. 



Maximizing this function gives us the Boltzmann distri- 
bution for an Ising model 

p{s) = ^ exp I - ^ \,s^ - ^ Ty s,Sj 1 , (B.4) 

where Z is the normahzation factor or partition function. 
The values of Aj and are left to be determined by en- 
suring this distribution is consistent with our constraints 
p, and V. 

It can be shown that for symmetric constraints the 
Lagrange multipliers are uniform. In other words, 



Aj = A, Vi, 



(B.5) 
(B.6) 



This allows us to write the following expression for the 
maximum entropy distribution: 



V{s) = ^ exp I -A ^ - 7 ^ SjSj J 



(B.7) 



p{k) = 2 I ^ ^ ^ 2 



(B., 



Note that there are (^) states with probability p{k). 



Using expression (B.8), we find the maximum entropy 



by using the f solve function from the SciPy package of 



Python subject to constraints (B.ll and (B.2) 




FIG. B.I. The maximum possible entropy scales linearly with 
system size, A'^, as shown here for various values of fi and v. 
Note that this linear scaling holds even for large correlations. 



As Fig. |B.l| shows, the entropy scales linearly as a func- 
tion of N, even in cases where the correlations between 
all pairs of neurons (v) are quite large. While this is per- 
haps a surprising result, we can see that this must be the 
case for independent neurons, the maximum entropy so- 
lution with V — p^ . Because each neuron is independent, 
the entropy of this system must certainly scale linearly 
with N. 

Moreover, we can construct a distribution that has en- 
tropy with linear scaling for any allowed values of ^ and 
V using this solution. Recall that the vector p represents 
the full distribution over all 2^ states. Consider the fol- 
lowing probability distribution Ppopi which we will call 
the "population spike" model. This model contains only 
two states with non-zero probability: The state with all 
neurons active {pi) and the state with no active neurons 
(po)- They are weighted so that the firing rate of this 
model matches that of the independence model: 



Pi = A*, 



(B.9) 
(B.IO) 
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As mentioned above, in this model v is equal to its 
maximum allowed value, [i. Because the independent 
model has the smallest allowed value of v (in the large N 
limit), we can combine these two models to create a one- 
parameter family of distributions that have fixed /z value 
and cover all allowed values for v. The independent part 
of this distribution will guarantee that the entire family 
has an entropy that scales linearly with N\ thus, the true 
maximum must grow at least linearly with N as well. 
Our new distribution Pmix is simply 

Pmix = - x)pind + xppop, whcrc < X < 1 (B.ll) 

Pmix lias firing rate fi (just like both Pmd and Ppop) and 
J/ = (1 — x)fi^ + Xfl. 

Because entropy is a concave function f20|, by Jensen's 
inequality the entropy of Pmix is bounded below by 



S[Pmix] > (1 - x)S[pind] + xS[p 



pop] 



(B.12) 



For fixed fi and v the second term is a constant in N, 
whereas the first term grows linearly with N. This im- 
plies that the true maximum entropy must grow at least 
linearly with N for any fixed values of fi and v. 

We note that there is a simple upper bound on the 
entropy that also scales linearly with N. The maximum 
possible entropy for fixed N is obtained by setting all 
probabilities equal to one another yielding an entropy of 
exactly N (in fact, this is the entropy of the independence 
model with fi = 1/2). Considering that both the upper 
bound and lower bound for the maximum entropy for 
fixed /i and v scale linearly, the maximum entropy itself 
must also scale linearly for large N, consistent with our 



computations (Fig. B.l) 



APPENDIX C: MINIMUM ENTROPY OCCURS 
AT SMALL SUPPORT 

Our goal is to minimize the entropy function 



(C.l) 



i=0 



where Ug is the number of states, the pi satisfy a set of 
ric independent linear constraints, and Pi > for all i. 
For the main problem we consider, Ug = 2^ and normal- 
ization, mean firing rates and pairwise firing rates give 
71c = 1 + iV -|- N{N — l)/2. For the exchangeable case 
with symmetric constraints, ris = N + 1 and ric = 3. 

Our task is therefore to minimize a globally con- 
cave function over ad — rig — ric dimensional linear 
(affine) space L contained in the (compact) simplex 
{p : = 1; Pi ^ 0}- It is well known that the 

minima of such a problem occur at the vertices of the 
boundary of the space [44|, which necessarily have some 



Pi equal to zero, unless L intersects the simplex in a sin- 
gle point. Moreover, if a distribution satisfying the con- 
straints exists, then there is one with at most Uc non-zero 
Pi (e.g., from arguments as in [5S]). Together, these two 
facts imply that there are minimum entropy distributions 
with a maximum of Uc non-zero pi (and can occasion- 
ally have fewer). This means that even though the state 
space may grow exponentially with N, the support of 
the minimum entropy solution for fixed means and pair- 
wise correlations will only scale quadratically with N. In 
fact, we know that for certain values of /x and v solutions 
can have a far smaller support because the construction 
shown in Appendix [F| has a support size that scales only 
linearly with N. 



APPENDIX D: MINIMUM ENTROPY FOR 
EXCHANGEABLE PROBABILITY 
DISTRIBUTIONS 

Although the values of the firing rate (/i) and pairwise 
correlations (v) may be identical for each neuron and 
pair of neurons, the probability distribution that gives 
rise to these statistics need not be exchangeable as we 
have already shown. Indeed, as we explain below, it is 
possible to construct non-exchangeable probability distri- 
butions that have dramatically lower entropy then both 
the maximum and the minimum entropy for exchange- 
able distributions. That said, exchangeable solutions are 
interesting in their own right because they have large N 
scaling behavior that is distinct from the global entropy 
minimum, and they provide a symmetry that can be used 
to lower bound the information transmission rate close to 
the maximum possible across all distributions. 

Restricting ourselves to exchangeable solutions repre- 
sents a significant simplification. In the general case, 
there are 2^ probabilities to consider for a system of 
N neurons. There are N constraints on the firing rates 
(one for each neuron) and (^) pairwise constraints (one 
for each pair of neurons). This gives us a total number 
of constrains (ric) that grows quadratically with N: 



1 



N{N + 1) 



(D.I) 



However in the exchangeable case, all states with the 
same number of spikes have the same probability so there 
are only A^-|- 1 free parameters. Moreover, the number of 
constraints becomes 3 as there is only one constraint each 
for normalization, firing rate, and pairwise firing rate (as 
expressed in Eqs. (A. 4), (A. 5), and (A. 6), respectively). 

In general, the minimum entropy solution for ex- 
changeable distributions should have the minimum sup- 
port consistent with these three constraints. Therefore, 
the minimum entropy solution should have at most three 
non-zero probabilities. 
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For the symmetrical case with = 1/2 and v = 1/4, we 
can construct the exchangeable distribution with mini- 
mum entropy for all even N . This distribution consists 
of the all ones state, the all zeroes state, and all states 
with Njl ones. The constraint [i = 1/2 implies that 
p{0) = p{N), and the condition v = 1/4 imphes 



p{N/2) 



N - 1 (A^/2) 



|2 



N 



N even, 



(D.2) 



which corresponds to an entropy of 
log2(2iV) 



gexcn 



N 
N - 1 



N 



loe 



(Ar/2)!2(A^- 1) 



N-ll2\og^{N) 

"log2(A0" 

A^ 



l/21og2(2^) 



(D.3) 



(D.4) 



For arbitrary values oi fi, v and N, it is difficult to 
determine from first principles which three probabilities 
are non-zero for the minimum entropy solution, but for- 
tunately the number of possibilities (^;^^) is now small 
enough that we can exhaustively search by computer to 
find the set of non-zero probabilities corresponding to the 
lowest entropy. 

Using this technique, we find that the scaling behavior 
of the exchangeable minimum entropy is linear with A^ 
as shown in Fig. |D.1[ We find that the asymptotic slope 
is positive, but less than that of the maximum entropy 
curve, for all v ^ p^ . For the symmetrical case, ly — fi^, 
our exact expression Eq. ( |D.3 ) for the exchangeable dis- 
tribution consisting of the all ones state, the all zeros 
state, and all states with A^/2 ones agrees with the mini- 
mum entropy exchangeable solution found by exhaustive 
search, and in this special case the asymptotic slope is 
identical to that of the maximum entropy curve. 



APPENDIX E: CONSTRUCTION OF A LOW 
ENTROPY DISTRIBUTION FOR ALL VALUES 
OF fj. AND ly 

We can construct a probability distribution with 
roughly A^^ states with nonzero probability out of the 
full 2^ possible states of the system such that 



n 



n(n — 1) 
A^(A^- 1)' 



(E.l) 



where N is the number of neurons in the network and n 
is the number of neurons that are active in every state. 
Using this solution as a basis, we can include the states 
with all neurons active and all neurons inactive to create 
a low entropy solution for all allowed values for ^ and v 
(See Appendix [G]) . We refer to the entropy of this low 




FIG. D.l. The minimum entropy for exchangeable distribu- 
tions versus N for various values of fi and i/. Note that, like 
the maximum entropy, the exchangeable minimum entropy 
scales linearly with A as A — >■ 00, albeit with a smaller slope 
for u ^ fi^. We can calculate the entropy exactly for fi = 0.5 
and u — 0.25 as A — >■ 00, and we find that the leading term 



is indeed linear: 
0[log2(A)/A]. 



A-l/21og2(A)-l/21og2(27r) + 



entropy construction S'™"^ to distinguish it from the en- 
tropy (S*™") of another low entropy solution described in 
the next section. Our construction essentially goes back 
to Joffe as explained by Luby in [27]. 

We derive our construction by first assuming that A^ 
is a prime number, but this is not actually a limitation 
as we will be able to extend the result to all values of A^. 
Specifically, non-prime system sizes are handled by tak- 
ing a solution for a larger prime number and removing the 
appropriate number of neurons, ft should be noted that 
occasionally the solution derived using the next largest 
prime number does not necessarily have the lowest en- 
tropy and occasionally we must use even larger primes to 
find the minimum entropy possible using this technique; 
all plots in the main text were obtained by searching for 
the lowest entropy solution using the 10 smallest primes 
that are each at least as great as the system size A^. 

We begin by illustrating our algorithm with a concrete 
example; following this illustrative case we will prove that 
each step does what we expect in general. Consider A^ — 
5, and n = 3. The algorithm is as follows: 

1. Begin with the state with 71 = 3 active neurons in 
a row: 

11100 
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2. Generate new states by inserting progressively 
larger gaps of Os before each 1 and wrapping ac- 
tive states that go beyond the last neuron back to 
the beginning. This yields — 1 = 4 unique states 
including the original state: 

11100 
10101 
11010 
10011 

3. Finally, "rotate" each state by shifting each pat- 
tern of ones and zeros to the right (again wrapping 
states that go beyond the last neuron) . This yields 
a total of N{N - 1) states: 

11100 OHIO 00111 10011 11001 
10101 11010 01101 10110 01011 
11010 01101 10110 01011 10101 
10011 11001 11100 OHIO 00111 

4. Note that each state is represented twice in this 
collection, removing duplicates we are left with 
N{N — 1) /2 total states. By inspection we can ver- 
ify that each neuron is active in n{N — l)/2 states 
and each pair of neurons is represented in n{n—l)/2 
states. Wighting each state with equal probability 
gives us the values for fx and v stated in Eq. (lE.ll) 



Now we will prove that this construction works in gen- 
eral for TV prime and any value of n by establishing (1) 
that step 2 of the above algorithm produces a set of states 
with n spikes, (2) that this method produces a set of 
states that when weighted with equal probability yield 
neurons that all have the same firing rates and pairwise 
statistics, and (3) that this method produces at least dou- 
ble redundancy in the states generated as stated in step 4 
(although in general there may be a greater redundancy) . 
In discussing (1) and (2) we will neglect the issue of re- 
dundancy and consider the states produced through step 
3 as distinct. 

First we prove that step 2 always produces states with 
n neurons, which is to say that no two spikes are mapped 
to the same location as we shift them around. We will 
refer to the identity of the spikes by their location in the 
original starting state; this is important as the operations 
in step 2 and 3 will change the relative ordering of the 
original spikes in their new states. With this in mind, 
the location of the ith spike with a spacing of s between 
them will result in the new location I (here the original 
state with all spikes in a row is s = 1): 



I — {s ■ i) mod N, 



(E.2) 



where i S {0, 1, 2, — 1}. In this form, our statement 
of the problem reduces to demonstrating that for given 
values of s and N, no two values of i will result in the 
same I. This is easy to show by contradiction. If this 



were the case, 

(s • ii) mod TV = (s • 13) mod N (E.3) 
(s • {ii - is)) mod iV = 0. (E.4) 

For this to be true, either s or [ii — 12) must contain 
a factor of N, but each are smaller than N so we have 
a contradiction. This also demonstrates why N must be 
prime — if it were not, it would be possible to satisfy this 
equation in cases where s and (ii —12) contain between 
them all the factors of N. 

It is worth noting that this also shows that there is a 
one-to-one mapping between s and I given i. In other 
words, each spike is taken to every possible neuron in 
step 2. For example, if = 5, and we fix i = 2: 



2 mod 5 = 
2 mod 5^2 
2 mod 5 = 4 
2 mod 5 = 1 
2 mod 5 = 3 



If we now perform the operation in step 3, then the 
location I of spike i becomes 



I = (s -i + d) mod N, 



(E.5) 



where d is the amount by which the state has been ro- 
tated (the first column in step 3 is d = 0, the second 
is d = 1, etc.). It should be noted that step 3 trivially 
preserves the number of spikes in our states so we have 
established that steps 2 and 3 produce only states with 
n spikes. 

We now show that each neuron is active, and each pair 
of neurons is simultaneously active, in the same number 
of states. This way when each of these states is weighted 
with equal probability, we find symmetric statistics for 
these two quantities. 

Beginning with the firing rate, we ask how many states 
contain a spike at location I. In other words, how 
many combinations of s, z, and d can we take such that 
Eq. (E.5) is satisfied for a given I. For each choice of s 



and i there is a unique value of d that satisfies the equa- 
tion, s can take values between 1 and — 1, and i takes 
values from to n — 1, which gives us n(A^ — 1) states 
that include a spike at location /. Dividing by the total 
number of states N{N — 1) we obtain an average firing 
rate of 



n 

N' 



(E.6) 



Consider neurons at li and I2', we wish to know how 
many values of s, d, ii and 12 we can pick so that 



'1 (s • *i + d) mod A^, 
h — {s ■ 12 + d) mod A^. 



(E.7) 
(E.8) 
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Taking the difference between these two equations, we 
find 



A/ = (s • {i2 - ii)) mod N. 



(E.9) 



From our discussion above, we know that this equation 
uniquely specifies s for any choice of «i and 12- Further- 



more, we must pick d such that Eqs. (E.7| and (E.8) 
are satisfied. This means that for each choice of ii and 
J2 there is a unique choice of s and rf, which results in 
a state that includes active neurons at locations h and 
l2- Swapping ii and Z2 will result in a different s and d. 
Therefore, we have n(n— 1) states that include any given 
pair - one for each choice of ii and ?2- Dividing this num- 
ber by the total number of states, we find a correlation 
u equal to 



n(n — 1) 
N{N-1) 



(E.IO) 



where N is prime. 

Finally we return to the question of redundancy among 
states generated by steps 1 through 3 of the algorithm. 
Although in general there may be a high level of redun- 
dancy for choices of n that are small or close to N, we 
can show that in general there is at least a twofold degen- 
eracy. Although this does not impact our calculation of 
/i and V above, it does alter the number of states, which 
will affect the entropy of system. 

The source of the twofold symmetry can be seen imme- 
diately by noting that the third and fourth rows of our 
example contain the same set of states as the second and 
first respectively. The reason for this is that each state in 
the s = 4 case involves spikes that are one leftward step 
away from each other just as s = 1 involves spikes that 
are one rightward shift away from each other. The labels 
we have been using to refer to the spikes have reversed 
order but the set of states are identical. Similarly the 
s = 3 case contains all states with spikes separated by 
two leftward shifts just as the s = 2 case. Therefore, the 
set of states with s = a is equivalent to the set of states 
with s = N — a. Taking this degeneracy into account, 
there are at most N{N — 1)/2 unique states; each neuron 
spikes in n{N — l)/2 of these states and any given pair 
spikes together in n(n — l)/2 states. 

Because these states each have equal probability the 
entropy of this system is bounded from above by 



gcon2 



< log; 



fN{N - 



I] 



(E.ll) 



where A'^ is prime. As mentioned above, we write this as 
an inequality because further degeneracies among states 
beyond the factor of two that always occurs are possible 
for some prime numbers. In fact, in order to avoid non- 
monotonic behavior, the curves for S"!""^ shown in Figs. 
1,2 of the main text were generated using the lowest en- 
tropy found for the 10 smallest primes greater than N 
for each value of N . 



We can extend this result to arbitrary values for N in- 
cluding non-primes by invoking the Bertrand-Chebyshev 
theorem, which states that there always exists at least 
one prime number p with n < p <2n — 2 for any integer 
n> \: 



5"^°"^ < log2 {N{2N - 



1)), 



(E.12) 



where N is any integer. Unlike the maximum entropy 
and the entropy of the exchangeable solution, which we 
have shown to both be extensive quantities, this scales 
only logarithmically with the system size N . 



APPENDIX F: ANOTHER LOW ENTROPY 
CONSTRUCTION FOR THE 
COMMUNICATIONS REGIME, ^Ji = ^/2 v = 1/4 

We have found another low entropy construction in the 
regime most relevant for communications systems [p, = 
1/2, V = 1/4) that allows us to satisfy our constraints for 
a system of A'" neurons with only 2A^ active states. The 
algorithm to determine the states needed is recursive in 
that the states needed for A" = 2' are built from the 
states needed for N = 2'^~^, where q is any integer greater 
than 2. 

We begin with A = 2-'^ = 2. Here we can easily write 
down a set of states that when weighted equally lead to 
the desired statistics. Listing these states as rows of zeros 
and ones, we see that they include all possible two-neuron 
states: 



1 1 
1 



1 

In order to find the states needed for A^ 
replace each 1 in the above by 

1 1 
1 



and each by 





1 



(F.l) 



4 we 



(F.2) 



(F.3) 



to arrive at a new array for twice as many neurons and 
twice as many states with nonzero probability: 

1111 
10 1 
11 
10 1 

10 10 
110 
110 



(F.4) 
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By inspection, we can verify that each new neuron is 
spiking in half of the above states and each pair is spiking 
in a quarter of the above states. This procedure preserves 
/i = 1/2, V = 1/4, and (siSjSk) — foi' neurons; 
thus providing a distribution that mimics the statistics of 
independent binary variables up to third order (although 
it does not for higher orders). Let us consider the the 
proof that /x = 1/2 is preserved by this transformation. 
In the process of doubling the number of states from iV' 
to iV^+^j each neuron with firing rate ^^"^^ "produces" 
two new neurons with firing rates /i^'"''^^ and It 
is clear from Eqs. (F.2) and (F.3) that we obtain the 



following two relations, 



^(9+1) _ (q) 
PI ~ P : 



= 1/2. 



(F.5) 

(F.6) 
(F.7) 



It is clear from these equations that if we begin with 
/i^^^ = 1/2 that this will be preserved by this transfor- 
mation. By similar, but more tedious, methods one can 
show that ly — 1/4, and (siSjSk) = ^/s. 

Therefore, we are able to build up arbitrarily large 
groups of neurons that satisfy our statistics using only 
2N states by repeating the procedure that took us from 

= 2 to = 4. Since these states are weighted with 
equal probability we have an entropy that grows only 
logarithmically with N 



5r"=log2(2iV), iV = 2^ g = 2,3,4,. 



We mention briefly a geometrical interpretation of this 
probability distribution. The active states in this distri- 
bution can be thought of as a subset of 2N corners on 
an N dimensional hypercube with the property that the 
separation of almost every pair is the same. Specifically, 
for each active state, all but one of the other active states 
has a Hamming distance of exactly N/2 from the original 
state; the remaining state is on the opposite side of the 
cube, and thus has a Hamming distance of A^. In other 
words, for any pair of polar opposite active states, there 
are 2A^ — 2 active states around the "equator." 



We can extend Eq. (F.8) to arbitrary numbers of neu- 



rons that are not multiples of 2 by taking the least mul- 
tiple of 2 at least as great as N, so that in general: 



gcon 



riog2(2A^)l <log2(iV) 



N>2. 



(F.9) 



By adding two other states we can extend this probability 
distribution so that it covers most of the allowed region 
for /i and v while remaining a low entropy solution, as 
we now describe. 

We remark that the authors of [25l [28j provide a lower 
bound of n{N) for the sample size possible for a pairwise 
independent binary distribution, making the sample size 
of our novel construction essentially optimal. 



APPENDIX G: EXTENDING THE RANGE OF 
VALIDITY FOR THE CONSTRUCTIONS 



We now show that each of these low entropy probabil- 
ity distributions can be generalized to cover much of the 



allowed region depicted in Fig. A. 2 in fact, the distribu- 
tion derived in Appendix [E] can be extended to include 
all possible combinations of the constraints /i and i^. This 
can be accomplished by including two additional states: 
the state where all neurons are silent and the state where 
all neurons are active. If we weight these states by prob- 
abilities po and pi respectively and allow the N{N— 1) /2 
original states to carry probability p„ in total, normal- 
ization requires 



Po +P71 +pi = l. 



(G.l) 



We can express the value of the new constraints and 
v') in terms of the original constraint values (p and v) as 
follows: 



(1 -PQ-Pl)^i+Pl 
{l-po)fJ. + Pi{l- n), 
(1 -j3o)'^ + Pi(l - i^)- 



(G.2) 
(G.3) 
(G.4) 



These values span a triangular region in the fi-v 
plane that covers the majority of satisfiable constraints. 
Fig. G.l illustrates the situation for /z = 1/2. Note that 



by starting with other values of fj,, we can construct a low 
entropy solution for any possible constraints fi' and i^' . 

With the addition of these two states, the entropy of 
the expanded system 5*1°"^ is bounded from above by 



5~ con2 QCon2 
2 — Pni->2 



E 

ie{o,i,? 



Pt l0g2(Pi)- 



(G.5) 



} 



For given values of /i' and v' , the pi are fixed and 
only the first term depends on TV. This means that, like 
the original distribution, the entropy of this distribution 
scales logarithmically with N . Therefore, by picking our 
original distribution properly, we can find low entropy 
distributions for any and v for which the number of 



active states grows as a polynomial in N (see Fig. G.l ). 



Similarly, we can extend the range of validity for the 
construction described in Appendix ?? to the triangular 
region shown in Fig. |A.2| by assigning probabilities poj 
Pi, and Pn/2 to the all silent state, all active state, and 
the total probability assigned to the remaining 2N — 2 
states of the original model, respectively. The entropy of 
this extended distribution must be no greater than the 



entropy of the original distribution (Eq. (F.9)), since the 
same number of states are active, but now they are not 
weighted equally, so this remains a low entropy distribu- 
tion. 
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FIG. G.l. The full shaded region includes all allowed values 
for the constraints {i and v for all possible probability distri- 
butions, replotted from Fig. |A.2[ The triangular blue shaded 
region includes all possible values for the constraints begin- 
ning with either of our constructed solutions with = 1/2 and 
V = i/i- Choosing other values of and v for the construction 
described in Appendix [E] would move the vertex to any de- 
sired location on the v — boundary. Note that even with 
this solution alone, we can cover most of the allowed region. 



APPENDIX H: PROOF OF THE LOWER BOUND 
ON ENTROPY FOR ANY DISTRIBUTION 
CONSISTENT WITH GIVEN ^l &^ v 

Using the concavity of the entropy function, wc can 
derive a lower bound on the minimum entropy. Our lower 
bound asymptotes to a constant except for the special 
case /i = 1/2 and v = 1/4, which is especially relevant for 
communication systems since it matches the low order 
statistics of the maximum entropy solution. 

We begin by re-expressing the entropy as foUows: 



p{wy 



log; 



p{w) ^ p{w) 



log2 



p2 p{w) p{w) 



(H.l) 
(H.2) 
(H.3) 



where p represents the full vector of all 2^ state prob- 
abilities. Note that p(w)^/p^ can be thought of as a 
probability distribution over w since its elements are non- 
negative and they sum to one. In this form, we can take 
advantage of the convexity of x log2 x by using Jensen's 



inequality to obtain a lower bound on the entropy: 

5(p)>P^E 

w 

X l0g2 



p{wY 1 

p2 p{w) 
p{w'f 1 



p2 p[w') 



logs P 



(H.4) 

(H.5) 
(H.6) 



In the final step we use the fact that p{w) is normalized. 

Now we seek an upper bound on p^. This can be ob- 
tained by starting with the matrix representation C of 
the constraints (for now, we consider each state of the 
system, Si, as binary column vectors, where i labels the 
state and each of the N components is cither 1 or 0) : 



C ^ {sf) 



= y^^p{si)sis 



7fr 

i 5 



(H.7) 
(H.8) 



where C is an x matrix. In this form, the diagonal 
entries of C, Cmm, are equal to and the off diagonal 
entries, Cmn, are equal to i^mn- Of course, in the sym- 
metric problem we consider here, all diagonal entries are 
the same, and all off-diagonal entries are the same. We 
will take this to be the case from this point on. 

For the calculation that follows, it is expedient to rep- 
resent words of the system as s G { — 1, 1}^ rather than 
{0, 1}^ (i.e., -1 represents a silent neuron instead of 
0). The relationship between the two can be written 



s = 2s - 1, 



(H.9) 



where 1 is the vector of all ones. Using this expression, 
we can relate C to C: 



c = (sr) 

- ((2s-l)(2s^-l^) 



11^, 



1. 



For our symmetric case, this reduces to 



= 1, 



Cmn = 4(t^ - Ai) + 1, rn^ n. 



(H.IO) 
(H.ll) 

(H.12) 
(H.13) 



(H.14) 
(H.15) 



Returning to Eq. (II.8I to find an upper bound on p , 
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we take the square of the Frobenius norm of C : 
||C|||, = Tr(C'^C) 




= Tr Y^p{si)p{sj)s^sfsjS^ 

i 



(H.16) 
(H.17) 

(H.18) 

(H.19) 

(H.20) 

(H.21) 
(H.22) 



The final hne is where our new representation pays off: in 
this representation, Si ■ s,; — N. This gives us the desired 
upper bound for p^: 



rill ^ 2 



(H.23) 



ICWjp in terms of /i and 



Using Eqs. (H.16 1, (H.14), and (H.15), we can express 

2 

\f 



\\C\\j.^Y.^rnm+T. 



(H.24) 



= N + N{N-l){A{i^-^) + iy . (H.25) 



Combining this result with Eqs. (H.23 1 and (H.6|, we 



obtain a lower bound for the entropy for any distribution 
consistent with any given pair of values for fi and i>: 



S{p)>S^,°^logJ- 



N 



(H.26) 



where a(^, z^) ~ (4(z/ — p) + 1)^. 

For large values of N this lower bound asymptotes to 
a constant 



hm S'," = log2 (l/a) 



(H.27) 



unless ^ — 1/2 and v = 1/4, in which case 

S',° = log^{N). (H.28) 
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